Edges weighted with the combined score generated by the STRING database will be useful for comparison against our own method and to test the community detection analysis before the weighted edges generated using our method are ready. Two options exist to get these weightings:
Unfortunately, the online service produces a table that does not include the Entrez IDs that are originally put in, so the output would have to be mapped back to Entrez IDs for our pipeline. The fastest way will be to use the pickled object created in the above notebook to generate features and take only the combined values:
In [1]:
cd ../../features
In [2]:
import csv
In [3]:
ls
In [4]:
import sys
In [5]:
sys.path.append("/home/gavin/Documents/MRes/opencast-bio/")
In [6]:
import ocbio.string
In [7]:
import pickle
In [8]:
f = open("../string/human.Entrez.string.pickle")
stringfeatures = pickle.load(f)
f.close()
In [32]:
pulldownpairfile = open("../forGAVIN/pulldown_data/pulldown.interactions.Entrez.tsv")
stringedgefile = open("pulldown.string.edges.tsv", "w")
cp = csv.reader(pulldownpairfile, delimiter="\t")
cs = csv.writer(stringedgefile, delimiter="\t")
for l in cp:
# for each pair index the feature dictionary
# write the pairs that are non-zero
pair = frozenset(l)
combinedscore = float(stringfeatures[pair][-1])
if combinedscore > 0.0000001:
cs.writerow(l + [combinedscore])
pulldownpairfile.close()
stringedgefile.close()
In [33]:
!head pulldown.string.edges.tsv